NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SymbolFit: Automatic Parametric Modeling with Symbolic Regression

https://doi.org/10.1007/s41781-025-00140-9

Tsoi, Ho_Fung; Rankin, Dylan; Caillol, Cecile; Cranmer, Miles; Dasu, Sridhara; Duarte, Javier; Harris, Philip; Lipeles, Elliot; Loncar, Vladimir (July 2025, Computing and Software for Big Science)

Abstract We introduce SymbolFit (API: https://github.com/hftsoi/symbolfit), a framework that automates parametric modeling by using symbolic regression to perform a machine-search for functions that fit the data while simultaneously providing uncertainty estimates in a single run. Traditionally, constructing a parametric model to accurately describe binned data has been a manual and iterative process, requiring an adequate functional form to be determined before the fit can be performed. The main challenge arises when the appropriate functional forms cannot be derived from first principles, especially when there is no underlying true closed-form function for the distribution. In this work, we develop a framework that automates and streamlines the process by utilizing symbolic regression, a machine learning technique that explores a vast space of candidate functions without requiring a predefined functional form because the functional form itself is treated as a trainable parameter, making the process far more efficient and effortless than traditional regression methods. We demonstrate the framework in high-energy physics experiments at the CERN Large Hadron Collider (LHC) using five real proton-proton collision datasets from new physics searches, including background modeling in resonance searches for high-mass dijet, trijet, paired-dijet, diphoton, and dimuon events. We show that our framework can flexibly and efficiently generate a wide range of candidate functions that fit a nontrivial distribution well using a simple fit configuration that varies only by random seed, and that the same fit configuration, which defines a vast function space, can also be applied to distributions of different shapes, whereas achieving a comparable result with traditional methods would have required extensive manual effort.
more » « less
Symbolic Regression with a Learned Concept Library

Grayeli, Arya; Sehgal, Atharva; Costilla, Omar; Cranmer, Miles; Chaudhuri, Swarat (December 2024, NeurIPS 2024)

We present a novel method for symbolic regression (SR), the task of searching for compact programmatic hypotheses that best explain a dataset. The problem is commonly solved using genetic algorithms; we show that we can enhance such methods by inducing a library of abstract textual concepts. Our algorithm, called LaSR, uses zero-shot queries to a large language model (LLM) to discover and evolve concepts occurring in known high-performing hypotheses. We discover new hypotheses using a mix of standard evolutionary steps and LLM-guided steps (obtained through zero-shot LLM queries) conditioned on discovered concepts. Once discovered, hypotheses are used in a new round of concept abstraction and evolution. We validate LaSR on the Feynman equations, a popular SR benchmark, as well as a set of synthetic tasks. On these benchmarks, LaSR substantially outperforms a variety of state-of-the-art SR approaches based on deep learning and evolutionary algorithms. Moreover, we show that LASR can be used to discover a new and powerful scaling law for LLMs.
more » « less
Full Text Available
Symbolic Regression with a Learned Concept Library

Grayeli, Arya; Sehgal, Atharva; Costilla-Reyes, Omar; Cranmer, Miles; Chaudhuri, Swarat (December 2024, Annual Conference on Neural Information Processing Systems 2024 (NeurIPS 2024))

Full Text Available
Augmenting astrophysical scaling relations with machine learning: Application to reducing the Sunyaev–Zeldovich flux–mass scatter

https://doi.org/10.1073/pnas.2202074120

Wadekar, Digvijay; Thiele, Leander; Villaescusa-Navarro, Francisco; Hill, J. Colin; Cranmer, Miles; Spergel, David N.; Battaglia, Nicholas; Anglés-Alcázar, Daniel; Hernquist, Lars; Ho, Shirley (March 2023, Proceedings of the National Academy of Sciences)

Complex astrophysical systems often exhibit low-scatter relations between observable properties (e.g., luminosity, velocity dispersion, oscillation period). These scaling relations illuminate the underlying physics, and can provide observational tools for estimating masses and distances. Machine learning can provide a fast and systematic way to search for new scaling relations (or for simple extensions to existing relations) in abstract high-dimensional parameter spaces. We use a machine learning tool called symbolic regression (SR), which models patterns in a dataset in the form of analytic equations. We focus on the Sunyaev-Zeldovich flux−cluster mass relation ( Y SZ − M ), the scatter in which affects inference of cosmological parameters from cluster abundance data. Using SR on the data from the IllustrisTNG hydrodynamical simulation, we find a new proxy for cluster mass which combines Y SZ and concentration of ionized gas ( c gas ): M ∝ Y conc 3/5 ≡ Y SZ 3/5 (1 − A c gas ). Y conc reduces the scatter in the predicted M by ∼20 − 30% for large clusters ( M ≳ 10 14 h −1 M ⊙ ), as compared to using just Y SZ . We show that the dependence on c gas is linked to cores of clusters exhibiting larger scatter than their outskirts. Finally, we test Y conc on clusters from CAMELS simulations and show that Y conc is robust against variations in cosmology, subgrid physics, and cosmic variance. Our results and methodology can be useful for accurate multiwavelength cluster mass estimation from upcoming CMB and X-ray surveys like ACT, SO, eROSITA and CMB-S4.
more » « less
Full Text Available
The SZ flux-mass ( Y – M ) relation at low-halo masses: improvements with symbolic regression and strong constraints on baryonic feedback

https://doi.org/10.1093/mnras/stad1128

Wadekar, Digvijay; Thiele, Leander; Hill, J. Colin; Pandey, Shivam; Villaescusa-Navarro, Francisco; Spergel, David N.; Cranmer, Miles; Nagai, Daisuke; Anglés-Alcázar, Daniel; Ho, Shirley; et al (April 2023, Monthly Notices of the Royal Astronomical Society)

ABSTRACT Feedback from active galactic nuclei (AGNs) and supernovae can affect measurements of integrated Sunyaev–Zeldovich (SZ) flux of haloes (YSZ) from cosmic microwave background (CMB) surveys, and cause its relation with the halo mass (YSZ–M) to deviate from the self-similar power-law prediction of the virial theorem. We perform a comprehensive study of such deviations using CAMELS, a suite of hydrodynamic simulations with extensive variations in feedback prescriptions. We use a combination of two machine learning tools (random forest and symbolic regression) to search for analogues of the Y–M relation which are more robust to feedback processes for low masses ($$M\lesssim 10^{14}\, \mathrm{ h}^{-1} \, \mathrm{ M}_\odot$$); we find that simply replacing Y → Y(1 + M*/Mgas) in the relation makes it remarkably self-similar. This could serve as a robust multiwavelength mass proxy for low-mass clusters and galaxy groups. Our methodology can also be generally useful to improve the domain of validity of other astrophysical scaling relations. We also forecast that measurements of the Y–M relation could provide per cent level constraints on certain combinations of feedback parameters and/or rule out a major part of the parameter space of supernova and AGN feedback models used in current state-of-the-art hydrodynamic simulations. Our results can be useful for using upcoming SZ surveys (e.g. SO, CMB-S4) and galaxy surveys (e.g. DESI and Rubin) to constrain the nature of baryonic feedback. Finally, we find that the alternative relation, Y–M*, provides complementary information on feedback than Y–M.
more » « less
HIFlow: Generating Diverse Hi Maps and Inferring Cosmology while Marginalizing over Astrophysics Using Normalizing Flows

https://doi.org/10.3847/1538-4357/ac8b09

Hassan, Sultan; Villaescusa-Navarro, Francisco; Wandelt, Benjamin; Spergel, David N.; Anglés-Alcázar, Daniel; Genel, Shy; Cranmer, Miles; Bryan, Greg L.; Davé, Romeel; Somerville, Rachel S.; et al (September 2022, The Astrophysical Journal)

Abstract A wealth of cosmological and astrophysical information is expected from many ongoing and upcoming large-scale surveys. It is crucial to prepare for these surveys now and develop tools that can efficiently extract most information. We present HIF low : a fast generative model of the neutral hydrogen (H i ) maps that is conditioned only on cosmology (Ω m and σ 8 ) and designed using a class of normalizing flow models, the masked autoregressive flow. HIF low is trained on the state-of-the-art simulations from the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project. HIF low has the ability to generate realistic diverse maps without explicitly incorporating the expected two-dimensional maps structure into the flow as an inductive bias. We find that HIF low is able to reproduce the CAMELS average and standard deviation H i power spectrum within a factor of ≲2, scoring a very high R 2 > 90%. By inverting the flow, HIF low provides a tractable high-dimensional likelihood for efficient parameter inference. We show that the conditional HIF low on cosmology is successfully able to marginalize over astrophysics at the field level, regardless of the stellar and AGN feedback strengths. This new tool represents a first step toward a more powerful parameter inference, maximizing the scientific return of future H i surveys, and opening a new avenue to minimize the loss of complex information due to data compression down to summary statistics.
more » « less
Full Text Available
A Deep-learning Approach for Live Anomaly Detection of Extragalactic Transients

https://doi.org/10.3847/1538-4365/ac0893

Villar, V. Ashley; Cranmer, Miles; Berger, Edo; Contardo, Gabriella; Ho, Shirley; Hosseinzadeh, Griffin; Lin, Joshua Yao-Yu (August 2021, The Astrophysical Journal Supplement Series)

Full Text Available

Search for: All records